73 research outputs found
Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition
This paper presents a self-supervised method for visual detection of the
active speaker in a multi-person spoken interaction scenario. Active speaker
detection is a fundamental prerequisite for any artificial cognitive system
attempting to acquire language in social settings. The proposed method is
intended to complement the acoustic detection of the active speaker, thus
improving the system robustness in noisy conditions. The method can detect an
arbitrary number of possibly overlapping active speakers based exclusively on
visual information about their face. Furthermore, the method does not rely on
external annotations, thus complying with cognitive development. Instead, the
method uses information from the auditory modality to support learning in the
visual domain. This paper reports an extensive evaluation of the proposed
method using a large multi-person face-to-face interaction dataset. The results
show good performance in a speaker dependent setting. However, in a speaker
independent setting the proposed method yields a significantly lower
performance. We believe that the proposed method represents an essential
component of any artificial cognitive system or robotic platform engaging in
social interactions.Comment: 10 pages, IEEE Transactions on Cognitive and Developmental System
Interactive Robot Learning of Gestures, Language and Affordances
A growing field in robotics and Artificial Intelligence (AI) research is
human-robot collaboration, whose target is to enable effective teamwork between
humans and robots. However, in many situations human teams are still superior
to human-robot teams, primarily because human teams can easily agree on a
common goal with language, and the individual members observe each other
effectively, leveraging their shared motor repertoire and sensorimotor
resources. This paper shows that for cognitive robots it is possible, and
indeed fruitful, to combine knowledge acquired from interacting with elements
of the environment (affordance exploration) with the probabilistic observation
of another agent's actions.
We propose a model that unites (i) learning robot affordances and word
descriptions with (ii) statistical recognition of human gestures with vision
sensors. We discuss theoretical motivations, possible implementations, and we
show initial results which highlight that, after having acquired knowledge of
its surrounding environment, a humanoid robot can generalize this knowledge to
the case when it observes another agent (human partner) performing the same
motor actions previously executed during training.Comment: code available at https://github.com/gsaponaro/glu-gesture
Cluster Analysis of Differential Spectral Envelopes on Emotional Speech
This paper reports on the analysis of the spectral variation of emotional speech. Spectral envelopes of time aligned speech frames are compared between emotionally neutral and active utterances. Statistics are computed over the resulting differential spectral envelopes for each phoneme. Finally, these statistics are classified using agglomerative hierarchical clustering and a measure of dissimilarity between statistical distributions and the resulting clusters are analysed. The results show that there are systematic changes in spectral envelopes when going from neutral to sad or happy speech, and those changes depend on the valence of the emotional content (negative, positive) as well as on the phonetic properties of the sounds such as voicing and place of articulation
S-HR-VQVAE: Sequential Hierarchical Residual Learning Vector Quantized Variational Autoencoder for Video Prediction
We address the video prediction task by putting forth a novel model that
combines (i) our recently proposed hierarchical residual vector quantized
variational autoencoder (HR-VQVAE), and (ii) a novel spatiotemporal PixelCNN
(ST-PixelCNN). We refer to this approach as a sequential hierarchical residual
learning vector quantized variational autoencoder (S-HR-VQVAE). By leveraging
the intrinsic capabilities of HR-VQVAE at modeling still images with a
parsimonious representation, combined with the ST-PixelCNN's ability at
handling spatiotemporal information, S-HR-VQVAE can better deal with chief
challenges in video prediction. These include learning spatiotemporal
information, handling high dimensional data, combating blurry prediction, and
implicit modeling of physical characteristics. Extensive experimental results
on the KTH Human Action and Moving-MNIST tasks demonstrate that our model
compares favorably against top video prediction techniques both in quantitative
and qualitative evaluations despite a much smaller model size. Finally, we
boost S-HR-VQVAE by proposing a novel training method to jointly estimate the
HR-VQVAE and ST-PixelCNN parameters.Comment: 14 pages, 7 figures, 3 tables. Submitted to IEEE Transactions on
Pattern Analysis and Machine Intelligence on 2023-07-1
Analisi gerarchica degli inviluppi spettrali differenziali di una voce emotiva
.In questo articolo viene descritto un nuovo metodo di analisi del timbro vocale tramite lo studio delle variazioni di inviluppo spettrale utilizzato da uno stesso parlatore in situazioni emotiva neutra o espressiva. Il contesto dell\u27analisi riguarda un corpus di un solo parlatore istruito a leggere una serie di frasi utilizzando uno stile di lettura neutro e successivamente utilizzando due modalit? emotive: uno stile allegro e uno stile triste. Gli inviluppi spettrali relativi alle versioni allineate delle realizzazioni vocali neutre e espressive (allegra e triste) sono confrontati utilizzando un metodo differenziale. Le differenze sono state calcolate tra lo stato emotivo e quello neutro, di conseguenza le due categorie messe a confronto sono neutro-allegro e neutro-triste. La statistica degli inviluppi differenziali ? stata calcolata per ogni fono. I dati sono stati esaminati utilizzando un metodo di clustering gerarchico di tipo agglomerativo. I cluster risultanti sono avvalorati con diverse misure di distanza tra le distribuzioni statistiche ed esplorati visivamente per trovare similitudini e differenze tra le due categorie. I risultati mettono in evidenza sistematiche variazioni nel timbro vocale relative ai due insiemi di differenze di inviluppi spettrali. Questi tratti dipendono dalla valenza dell\u27emozione presa in considerazione (positiva, negativa) come dalle propriet? fonetiche del particolare fono come ad esempio sonorit? e luogo di articolazione
User Evaluation of the SYNFACE Talking Head Telephone
Abstract. The talking-head telephone, Synface, is a lip-reading support for people with hearing-impairment. It has been tested by 49 users with varying degrees of hearing-impaired in UK and Sweden in lab and home environments. Synface was found to give support to the users, especially in perceiving numbers and addresses and an enjoyable way to communicate. A majority deemed Synface to be a useful product.
Bidirectional fluxes of spermine across the mitochondrial membrane.
The polyamine spermine is transported into the
mitochondrial matrix by an electrophoretic mechanism
having as driving force the negative electrical membrane
potential (DW). The presence of phosphate increases
spermine uptake by reducingDpH and enhancingDW. The
transport system is a specific uniporter constituted by a
protein channel exhibiting two asymmetric energy barriers
with the spermine binding site located in the energy well
between the two barriers. Although spermine transport is
electrophoretic in origin, its accumulation does not follow
the Nernst equation for the presence of an efflux pathway.
Spermine efflux may be induced by different agents, such as
FCCP, antimycin A and mersalyl, able to completely or
partially reduce theDWvalue and, consequently, suppress
or weaken the force necessary to maintain spermine in the
matrix. However this efflux may also take place in normal
conditions when the electrophoretic accumulation of the
polycationic polyamine induces a sufficient drop inDWable
to trigger the efflux pathway. The release of the polyamine
is most probably electroneutral in origin and can take place
in exchange with protons or in symport with phosphate
anion. The activity of both the uptake and efflux pathways
induces a continuous cycling of spermine across the mitochondrial membrane, the rate of which may be prominent in imposing the concentrations of spermine in the inner and
outer compartment. Thus, this event has a significant role on
mitochondrial permeability transition modulation and consequently on the triggering of intrinsic apoptosis
- …